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1. -PREVIOUS WORK IN- VOICE TECHNOLOGY 

Bolt Beranek euid Newman Inc. has engaged in research, devel- 
opment, and consulting on a broad spectrum of speech-related problems 
for over two decades. We have done work in at least the following 
aoreas : 


• speech signal processing 

• automatic speech recognition 

• continuous speech understanding 

• speaker recognition 

• speech compression 

• subjective and objective evaluation of speech communication 
systems 

• measurement of the intelligibility and quality of speech 
when degraded by noise or other masking stimuli 

• speech synthesis 

• instructional aids for second-language learning and for 
training of the deaf 

• investigation of speech correlates of psychological stress 

In addition to these speech-related areas, we also work in experimental 
psychology, control systems, and human factors engineering, which are 
often relevant to the proper design and operation of speech systems. 

The review of BBN's past and present speech-related projects 
presented below should not be regarded as delimiting our expertise or 
research interests. Given our role as an R&D and consulting firm. 
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they represent only specific places where our expertise and interests 
have intersected with needs of our clients. 

1 . 1 Speech Understanding 

BBN was a principal participant in the recent five-year 
Speech Understanding Research (SUR) project, sponsored by the Advanced 
Research Projects Agency (ARPA) of the Department of Defense. The 
objective of the SUR project research was to discover, evaluate, and 
to incorporate into a total system, techniques for using higher level 
linguistic constraints and advanced signal processing and acoustic- 
phonetic analysis to determine the best possible interpretation of an 
unknown speech utterance. These speech understanding systems were to: 

"... accept continuous speech from many cooperative 
speakers of the General American dialect, in a quiet 
room over a good quality microphone, allowing slight 
tuning of the system per speaker, but requiring only 
natural adaptation by the user, permitting a slightly 
selected vocabulary of 1000 words, with a highly 

artificial syntax and a (well defined) task 

tolerating less than 10% semantic error, in a few 
times real time (on a 100 Mips machine) , and be de- 
monstrable in 1976 with a moderate chance of success." 

BBN's speech understanding system, called HWIM (for Hear What 
I Mean) , is a powerful research system for exploring alternative con- 
trol strategies and the effects of different system features. We have 
used this system to develop some powerful speech understanding algo- 
rithms. System components include: 

a) A linear predictive coding signal analysis component, which derives 
smooth spectral parameters, formant and pitch tracks, and other 
parametric information from the input speech waveform, 

b) An acoustic-phonetic recognition component, which segments the 
acoustic input into a lattice of alternative possible phonetic 
labelings of the input, 

c) An off-line dictionary generation component, which uses within- 
word and between-word phonological rules to produce word pronunci- 
ations expected to be encountered in fluent continuous speech, 

d) A fast lexical retrieval component, which can efficiently find 
words in the vocabulary that match well acoustically with the 
speech input and which accounts for context-dependent across- 
word phonological effects. 


e) An analysis-by-synthesis word verification con^sonent, which can 
synthesize the expected parametric representation of a hypothe- 
sized word (and its context) and compare it with the input param- 
eters , 

f) A grammar for interactions with a travel budget management system 
in natural English using a vocabulary of over 1000 words/ 

g) A bi-directional parser for ATN grammars, which can parse a sen- 
tence from left-to-right, right-to-left, or middle-out, 

h) A semantic network knowledge base, which contains general knowledge 
about trips amd places, as well as specific information about plan- 
ned trips, estimated costs, budgets, expenditures, etc., and 

i) A flexible control con^onent, which uses the other components to 
formulate, evaluate, and extend hypotheses into a complete inter- 
pretation of the sentence. 

HWIM's speech understanding is set in the context of a travel 
budget manager's automated assistant, which keeps track of trips taken 
and plainned and the budgets to which trip costs are charged, and it 
also allows the user to plan new trips. Users may interact with HWIM 
by speaking sentences from a rather general grammar (over 1000 words, 
with a high average breinching ratio and rejoining paths) forming a 
subset of natural English. Typical sentences from this task are: 

How much is left in the speech understanding budget? 

List all trips to California this year. 

What is the round-trip fare to Chicago? 

Cancel Jerry's trip to the ASA meeting. 

At the end of the SUR project in October 1976, HWIM correctly under- 
stood about half of its test utterances, spoken by three speakers. 
(1,4,7-13,16,18,19,23-29) 

Continuous speech understanding systems with the capabilities 
of HWIM and the other ARPA SUR project systems are not yet ready for 
immediate application, but that was not the goal of the ARPA SUR pro- 
ject. That goal was the development of an advanced technology of speech 
recognition and understanding. The technology developed during the 
ARPA SUR project has clear utility in speech recognition and \ander- 
standing applications that should be practical in the immediate future. 
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1.2 


Speech Bandwidth Compression 


BBN has been doing research in the speech compression area 
since 1972, with support from ARPA, and more recently from other spon- 
sors also. BBN has been and is currently involved in developing speech 
compression systems with a wide range of transmission bit rates, rang- 
ing from 75 to 16000 bits/sec, and with different operating conditions 
such as noisy or high-quality input speech, noisy or noise-free trans- 
mission channel, and fixed-rate (synchronous) or variable-rate (asyn- 
chronous) transmission. (2,9-13,16,21,22) 

The overall goal of the ARPA speech compression research has 
been to develop linear predictive speech compression (LPC) systems that 
transmit good quality speech at low data rates . Speech compression 
techniques developed in this project have been designed for their use 
in the ARPA Network environment of packet- switched data communications, 
though they are easily extendible to other communications environments . 

Recently developed techniques in linear prediction are used 
for the analysis and synthesis. We have developed several methods for 
reducing the redundancy in the speech signal without sacrificing speech 
quality. Included among these methods are preemphasis of the incoming 
speech signal, adaptive optimal selection of predictor order, optimal 
selection and quantization of transmission parameters, variable frame 
rate transmission, optimal encoding, and improved synthesis methodology. 
When we incorporated all of these in a floating point simulation of a 
pitch-excited linear predictive vocoder, we obtained synthesized speech 
with high quality at average transmission rates as low as 1500 bits/ 
sec (21,22). Our more recent results include: development of a new 

class of stable linear predictive speech analysis methods (12) ; speci- 
fications for an asynchronous or variable data rate linear predictive 
speech compression system to be implemented by the various ARPA-spon- 
sored sites for real-time speech transmission over the ARPA Network; 
application of nonlinear spectral warping techniques to either improve 
speech quality at a given bit rate, or to lower the transmission bit 
rate at a given speech quality. 

One of the major results of the ARPA speech compression pro- 
ject has been to demonstrate real-time speech transmission on the 
packet-switched ARPA network. . BBN participated in the implementation 
of the SPS-41-based initial system. More recently, a real-time system 
specified by BBN, transmitting at an average rate of 2200 bits/sec, 
has been implemented on a Floating Point Systems AP-12J2)B at Information 
Sciences Institute. The system will be implemented at BBN on the AP- 
12j0B we are about to receive. 

Our work on speech compression also includes the development 
of objective procedures for testing the quality of vocoded (or compressed) 
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speech (15,20). Since the objective procedures must be validated 
against results from subjective listening tests, we also have a program 
for the subjective evaluation of speech quality. We have explored the 
perceptual dimensions of speech quality by multidimensional sealing 
methods (2) . 

1. 3 Very-^Low Rate Vocoder 

An interesting outgrowth of our work in speech understanding, 
speech compression, and speech synthesis was a project combining pho- 
netic speech transmission system operating at 75 bits per second (14) . 
Based on this pilot project, we have proposed a real time implementa- 
tion for such a system. 

1.4 Speech Synthesis by Rule 

Our e^qjerience in speech synthesis is derived mainly from 
the research in synthesis-by-rule being carried out by Dennis Klatt 
at MIT cind at BBN (6,7). In our speech understanding system, synthesis 
played two roles, as a voice response component and as a component of 
an acoustic-phonetic word verifier, in which a hypothesized word (plus 
context, if any) was synthesized into an idealized time-varying spec- 
tral representation that was then compared against the analyzed utter- 
ance itself. In this way, generative acoustic-phonetic knowledge was 
vised to evaluate how well a hypothesized word matched a portion of the 
utterance (1,4,5). In the phonetic speech transmission system, the re- 
ceiver used a modification of the synthesis-by-rule program to resyn- 
thesize speech from the transmitted values of phoneme identify, duration, 
and fundamental frequency (14). 

1.5 Instructional Aids Systems 

The instructional aids systems are self-contained computer- 
based systems for real-time speech analysis and display. A minicom- 
puter receives information about speech-related waveforms via micro- 
phones and accelerometers connected to analog and digital preprocessing 
circuits. Algorithms for analysis and display operate on the data, 
sometimes under the control of the user, in such a way as to provide 
concvirrent visual amd auditory representations of speech sovind that 
may be useful to the user in the modification of his articulation. 

The second- language training system is designed to supple- 
ment the standard language laboratory. It allows a student to visually 
con?>are his efforts with pre-recorded teacher's versions. This system 
has been evaluated in the context of two language pairs: English 

speakers learning Chinese and Spanish speakers learning English (3). 
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The deaf-training system involves a trained teacher working 
with the student, with the system operating as a tool to enhance their 
interaction. In this case, attempts have been made to develop displays 
that are appropriate for use with very young children with severe lan- 
guage limitations as well as profound hearing losses. The prototype 
system is now being tested at two schools for the deaf (17) . 

1. 6 Other Projects 

Other projects dealing with voice technology include: 

- adapting our variable frame rate speech compression 
approach to fixed rate transmission operating at 2400 
bits/sec over a noisy transmission channel, 

- ultra-high quality analysis/synthesis of telephone 
quality speech at 16000 bits/second or less , where the 
resynthesized speech must be equal in quality to the 
original input, and 

- an investigation of how the psychological state of 

the user may be reflected in his speech characteristics. 

2, PRESENT PROJECTS IN VOICE TECHNOLOGY 

With one exception, our current research projects in speech 
processing are continuations of some of the projects described above. 

Our work in low rate speech compression continues in the 
direction of improving the quality of vocoded speech without sacri- 
ficing low data transmission rates. Presently under advanced testing 
is an improved voice source model incorporating both periodic and noise 
components, which largely eliminates the "buzziness" often associated 
with vocoded speech. We will also be bringing into real-time vocoder 
implementation many of the quality improvement techniques already dem- 
onstrated in our floating point vocoder simulations. We also expect 
to be starting work on high-quality speech synthesis of the type re- 
quired. for a very-low-rate phonetic vocoder system. 

Also continuing are the projects on: 

- variable-to-fixed rate transmission over a noisy channel 

- ultra-high quality analysis/synthesis at a 16 kbit rate 

- vocal indicators of the speaker's psychological state. 

One new project, not mentioned above, is to develop a pro- 
cessing system to improve the intelligibility of speech that has been 
corrupted by wideband noise. 
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3. 


; ANTICIPATED CAPABILITIES IN VOICE TECHNOLOGY 
3.1 Staff 


With its experience in a wide variety of projects dealing 
with voice processing, BBN numbers among its staff many with training 
and ea^erience in the field. In 1977, 11 full-time scientists and 3 
regular consultants are engaged in voice technology research and devel- 
opment, almost all of these with advanced degrees. We expect to main- 
tain at least this level of staffing in the foreseeable future. BBN's 
Information Sciences Division, within which our speech projects are 
based, numbers over 100 scientists from a broad variety of fields, 
particularly con^juter science, artificial intelligence, computational 
linguistics, electrical engineering, and the behavioral sciences. 

3.2 - Facilities 


The BBN Research Computer Center (RCC) has four DEC PDP-10's 
and one DECsystem-20. Three of the PDP-10*s run TENEX, a virttial mem- 
ory time sharing system developed by BBN. The other PDP-10 and the 
DECsystem-20 run TOPS-20, a DEC supported time sharing system based 
on TENEX. Much of the speech processing work not requiring real-time 
processing is carried out on the KL 10/90T system which runs TOPS-20. 
All of the progrcim libraries used in the speech and signal processing 
are ruunnable on both TENEX and TOPS-20. 

BBN's Speech Processing Laboratory contains equipment for 
speech signal accjuisition, display, editing, storage, and playback, 
and it provides a facility for advanced real-time speech processing 
systems research and development. It currently includes a DEC PDP-11/ 
40, a Signal Processing Systems Inc. SPS-41 signal processor (including 
dual A/D and D/A converters), and an Imlac PDS-1 graphics display pro- 
cessor. Delivery of a Floating Point Systems Inc. AP-120B array pro- 
cessor is scheduled for the beginning of calendar 1978; this addition 
will substantially enhance ovir real-time processing capabilities. The 
PDP-ll/system is connected to the ARPANET, which is used for data and 
program transfers to and from the RCC or any other site on the ARPANET, 
and for packet speech experiments for our continuing speech compression 
projects. The Laboratory also contains audio equipment for producing, 
manipulating, cuid recording audio signals. 
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